Water Observations from Space (WOFS)¶

This notebook demonstrates the Australian Water Observations from Space (WOFS) algorithm. This water detection algorithm is an improvement over the Landsat QA water flag or the NDWI index for water identification. For more information, visit this website:

http://www.ga.gov.au/scientific-topics/hazards/flood/wofs

This notebook uses a downloaded copy of the GA WOFS algorithm from https://github.com/GeoscienceAustralia/wofs

Install WOFS package¶

This should only need to be run once. Once it is finished, make sure that you refresh your browser and select the new 'wofs' kernel from the kernel selector at the top right.

In [1]:
# !sh ../bin/install_wofs.sh

Import Dependencies and Connect to the Data Cube ▴¶

In [2]:
import datacube
dc = datacube.Datacube(app='Water_Observations_from_Space')
from datacube.utils import masking

import sys, os
os.environ['USE_PYGEOS'] = '0'
from pathlib import Path

import datetime
import matplotlib.pyplot as plt
import numpy as np  
import xarray as xr
import rioxarray
import pandas as pd

from dea_tools.plotting import rgb, display_map
from dea_tools.bandindices import calculate_indices

### EASI tools
sys.path.append(os.path.expanduser('../scripts'))
from ceos_utils.data_cube_utilities.clean_mask import landsat_clean_mask_invalid, landsat_qa_clean_mask
from easi_tools import EasiDefaults
from easi_tools import notebook_utils
easi = EasiDefaults() # Get the default parameters for this system
Successfully found configuration for deployment "eail"
In [3]:
cluster, client = notebook_utils.initialize_dask(use_gateway=False)
display(cluster if cluster else client)
print(notebook_utils.localcluster_dashboard(client, server=easi.hub))

LocalCluster

f7547d87

Dashboard: http://127.0.0.1:8787/status Workers: 4
Total threads: 8 Total memory: 29.00 GiB
Status: running Using processes: True

Scheduler Info

Scheduler

Scheduler-52d6ce3a-f45d-4965-b429-8fab408e1c78

Comm: tcp://127.0.0.1:41993 Workers: 4
Dashboard: http://127.0.0.1:8787/status Total threads: 8
Started: Just now Total memory: 29.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:41049 Total threads: 2
Dashboard: http://127.0.0.1:43443/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:41365
Local directory: /tmp/dask-worker-space/worker-e5x08n1_

Worker: 1

Comm: tcp://127.0.0.1:34769 Total threads: 2
Dashboard: http://127.0.0.1:32947/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:37577
Local directory: /tmp/dask-worker-space/worker-8_bm4t0o

Worker: 2

Comm: tcp://127.0.0.1:34397 Total threads: 2
Dashboard: http://127.0.0.1:45655/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:32839
Local directory: /tmp/dask-worker-space/worker-pd7unrof

Worker: 3

Comm: tcp://127.0.0.1:38023 Total threads: 2
Dashboard: http://127.0.0.1:41619/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:34919
Local directory: /tmp/dask-worker-space/worker-ijtf3a1v
https://hub.eail.easi-eo.solutions/user/jhodge/proxy/8787/status
In [4]:
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True, client=client)
Out[4]:
<botocore.credentials.Credentials at 0x7f3e5eba9f90>

Choose Platforms and Products ▴¶

In [5]:
# Define the Product
product = "landsat8_c2l2_sr"

Define the Extents of the Analysis ▴¶

In [6]:
# Select an analysis region (Latitude-Longitude) 
# Select a time period within the extents of the dataset (Year-Month-Day)

# Mombasa, Kenya
# latitude = (-4.05, -3.95) 
# longitude = (39.60, 39.68) 

# latitude=easi.latitude
# longitude=easi.longitude
# latitude = (36.3, 36.5)
# longitude = (-114.3, -114.5)

# For this, we will deliberately use UTM projected coordinates as it
# appears that there might be a big in the wofs code when the area
# of interest has different sizes in the x and y dimensions
from pyproj import Proj, CRS, Transformer
crs = CRS.from_epsg(32611)
to_utm = Transformer.from_crs(crs.geodetic_crs, crs)
to_latlong = Transformer.from_crs(crs, crs.geodetic_crs)
utm = to_utm.transform(36.38,-114.4)
buffer = 12000 # set the buffer size in m

# Convert back to latitudes and longitudes to visualise the area
topleft = to_latlong.transform(utm[0]+buffer,utm[1]-buffer)
bottomright = to_latlong.transform(utm[0]-buffer,utm[1]+buffer)
latitude = (topleft[0],bottomright[0])
longitude = (topleft[1],bottomright[1])

# Define Time Range
# Landsat-8 time range: 07-Apr-2013 to current
time_extents = ('2021-01-01', '2021-12-31')
In [7]:
# The code below renders a map that can be used to view the analysis region.
display_map(longitude,latitude)
/env/lib/python3.10/site-packages/dea_tools/plotting.py:313: FutureWarning: This function is deprecated. See: https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1
  all_longitude, all_latitude = transform(Proj(crs), Proj('EPSG:4326'), all_x,
Out[7]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Load and Clean Data from the Data Cube ▴¶

After loading, you will view the Xarray dataset. Notice the dimensions represent the number of pixels in your latitude and longitude dimension as well as the number of time slices (time) in your time series.

In [8]:
measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']
data_names = measurements.copy()
data_names.remove('pixel_qa')
In [9]:
landsat_dataset = dc.load(y = (utm[1]-buffer,utm[1]+buffer),
                          x = (utm[0]-buffer,utm[0]+buffer),
                          time = time_extents,
                          product = product,
                          crs = 'EPSG:32611',
                          output_crs = 'EPSG:32611',
                          resolution = (-30,30),
                          measurements = measurements,
                          dask_chunks = {'time':1},
                          group_by = 'solar_day') 
In [10]:
landsat_dataset
Out[10]:
<xarray.Dataset>
Dimensions:      (time: 32, y: 800, x: 801)
Coordinates:
  * time         (time) datetime64[ns] 2021-01-13T18:15:45.783864 ... 2021-12...
  * y            (y) float64 4.041e+06 4.041e+06 ... 4.017e+06 4.017e+06
  * x            (x) float64 7.212e+05 7.212e+05 ... 7.452e+05 7.452e+05
    spatial_ref  int32 32611
Data variables:
    red          (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    green        (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    blue         (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    nir          (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    swir1        (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    swir2        (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    pixel_qa     (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
Attributes:
    crs:           EPSG:32611
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 32
    • y: 800
    • x: 801
    • time
      (time)
      datetime64[ns]
      2021-01-13T18:15:45.783864 ... 2...
      units :
      seconds since 1970-01-01 00:00:00
      array(['2021-01-13T18:15:45.783864000', '2021-01-29T18:15:42.664125000',
             '2021-02-14T18:15:39.346007000', '2021-03-02T18:15:32.931929000',
             '2021-03-18T18:15:23.744977000', '2021-03-27T18:09:11.412465000',
             '2021-04-03T18:15:20.299577000', '2021-04-12T18:09:06.233874000',
             '2021-04-19T18:15:13.841622000', '2021-04-28T18:08:57.907926000',
             '2021-05-05T18:15:03.987064000', '2021-05-14T18:08:58.479044000',
             '2021-05-21T18:15:13.720180000', '2021-05-30T18:09:07.721192000',
             '2021-06-06T18:15:21.649829000', '2021-06-15T18:09:14.059217000',
             '2021-06-22T18:15:26.735389000', '2021-07-08T18:15:28.967279000',
             '2021-07-24T18:15:34.321139000', '2021-08-09T18:15:41.447233000',
             '2021-08-25T18:15:45.998194000', '2021-09-10T18:15:50.800755000',
             '2021-09-26T18:15:54.176376000', '2021-10-12T18:15:59.488427000',
             '2021-10-21T18:09:49.750483000', '2021-10-28T18:16:00.489171000',
             '2021-11-06T18:09:48.002874000', '2021-11-13T18:15:56.158648000',
             '2021-11-29T18:15:56.195335000', '2021-12-08T18:09:45.267956000',
             '2021-12-15T18:15:55.088125000', '2021-12-31T18:15:49.342773000'],
            dtype='datetime64[ns]')
    • y
      (y)
      float64
      4.041e+06 4.041e+06 ... 4.017e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:32611
      array([4041225., 4041195., 4041165., ..., 4017315., 4017285., 4017255.])
    • x
      (x)
      float64
      7.212e+05 7.212e+05 ... 7.452e+05
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:32611
      array([721215., 721245., 721275., ..., 745155., 745185., 745215.])
    • spatial_ref
      ()
      int32
      32611
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-117],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32611"]]
      grid_mapping_name :
      transverse_mercator
      array(32611, dtype=int32)
    • red
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • green
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • blue
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • nir
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • swir1
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • swir2
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • pixel_qa
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      bit_index
      nodata :
      1
      flags_definition :
      {'snow': {'bits': 5, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'clear': {'bits': 6, 'values': {'0': 'not_clear', '1': 'clear'}}, 'cloud': {'bits': 3, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'water': {'bits': 7, 'values': {'0': 'land_or_cloud', '1': 'water'}}, 'cirrus': {'bits': 2, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'nodata': {'bits': 0, 'values': {'0': False, '1': True}}, 'qa_pixel': {'bits': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'values': {'1': 'Fill', '2': 'Dilated Cloud', '4': 'Cirrus', '8': 'Cloud', '16': 'Cloud Shadow', '32': 'Snow', '64': 'Clear', '128': 'Water', '256': 'Cloud Confidence low bit', '512': 'Cloud Confidence high bit', '1024': 'Cloud Shadow Confidence low bit', '2048': 'Cloud Shadow Confidence high bit', '4096': 'Snow Ice Confidence low bit', '8192': 'Snow Ice Confidence high bit', '16384': 'Cirrus Confidence low bit', '32768': 'Cirrus Confidence high bit'}, 'description': 'Level 2 pixel quality'}, 'cloud_shadow': {'bits': 4, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'dilated_cloud': {'bits': 1, 'values': {'0': 'not_dilated', '1': 'dilated'}}, 'cloud_confidence': {'bits': [8, 9], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'cirrus_confidence': {'bits': [14, 15], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}, 'snow_ice_confidence': {'bits': [12, 13], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}, 'cloud_shadow_confidence': {'bits': [10, 11], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}}
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2021-01-13 18:15:45.783864', '2021-01-29 18:15:42.664125',
                     '2021-02-14 18:15:39.346007', '2021-03-02 18:15:32.931929',
                     '2021-03-18 18:15:23.744977', '2021-03-27 18:09:11.412465',
                     '2021-04-03 18:15:20.299577', '2021-04-12 18:09:06.233874',
                     '2021-04-19 18:15:13.841622', '2021-04-28 18:08:57.907926',
                     '2021-05-05 18:15:03.987064', '2021-05-14 18:08:58.479044',
                     '2021-05-21 18:15:13.720180', '2021-05-30 18:09:07.721192',
                     '2021-06-06 18:15:21.649829', '2021-06-15 18:09:14.059217',
                     '2021-06-22 18:15:26.735389', '2021-07-08 18:15:28.967279',
                     '2021-07-24 18:15:34.321139', '2021-08-09 18:15:41.447233',
                     '2021-08-25 18:15:45.998194', '2021-09-10 18:15:50.800755',
                     '2021-09-26 18:15:54.176376', '2021-10-12 18:15:59.488427',
                     '2021-10-21 18:09:49.750483', '2021-10-28 18:16:00.489171',
                     '2021-11-06 18:09:48.002874', '2021-11-13 18:15:56.158648',
                     '2021-11-29 18:15:56.195335', '2021-12-08 18:09:45.267956',
                     '2021-12-15 18:15:55.088125', '2021-12-31 18:15:49.342773'],
                    dtype='datetime64[ns]', name='time', freq=None))
    • y
      PandasIndex
      PandasIndex(Float64Index([4041225.0, 4041195.0, 4041165.0, 4041135.0, 4041105.0, 4041075.0,
                    4041045.0, 4041015.0, 4040985.0, 4040955.0,
                    ...
                    4017525.0, 4017495.0, 4017465.0, 4017435.0, 4017405.0, 4017375.0,
                    4017345.0, 4017315.0, 4017285.0, 4017255.0],
                   dtype='float64', name='y', length=800))
    • x
      PandasIndex
      PandasIndex(Float64Index([721215.0, 721245.0, 721275.0, 721305.0, 721335.0, 721365.0,
                    721395.0, 721425.0, 721455.0, 721485.0,
                    ...
                    744945.0, 744975.0, 745005.0, 745035.0, 745065.0, 745095.0,
                    745125.0, 745155.0, 745185.0, 745215.0],
                   dtype='float64', name='x', length=801))
  • crs :
    EPSG:32611
    grid_mapping :
    spatial_ref
In [11]:
# Where to save the DEM fetched in ODC
DEM_PATH = "dem_for_wofs.tif"
In [12]:
# Load the elevation data
from os import environ
from cartopy.crs import PlateCarree
from datacube import Datacube
from datashader import reductions
import holoviews as hv
import hvplot.xarray
import matplotlib.pyplot as plt

dem = dc.load(
    product="copernicus_dem_30", 
    y = (utm[1]-buffer,utm[1]+buffer),
    x = (utm[0]-buffer,utm[0]+buffer),
    crs = "epsg:32611",
    output_crs="epsg:32611", 
    resolution=(-30, 30)
)
elevation = dem.elevation.squeeze()
In [13]:
# Have a look at the DEM data
options = {
    'title': 'Elevation',
    'frame_width': 400,
    'frame_height': 400,
    'aspect': 'equal',
    'cmap': plt.cm.terrain,
    'clim': (elevation.min().values.item(), elevation.max().values.item()),    # Limit the color range
    'colorbar': True,
    'tools': ['hover'],
}
plot_crs = 'epsg:32611'
elevation.hvplot.image(
     x = 'x', y = 'y',         # Dataset x,y dimension names 
     crs = plot_crs,
     rasterize = True,                        # If False, data will not be reduced. This is slow to load but all data is loaded.
     aggregator = reductions.mean(),          # Datashader calculates the mean value for reductions (also first, min, max, las, std, mode)
     precompute = True,                       # Datashader precomputes what it can
    ).opts(**options).hist(bin_range = options['clim'])
Out[13]:
In [14]:
dem_path = Path(DEM_PATH)
dem_path.parent.mkdir(parents=True, exist_ok=True)
elevation.rio.to_raster(dem_path)
In [15]:
# clear_mask = masking.make_mask(landsat_dataset['pixel_qa'], clear='clear')
# water_mask = masking.make_mask(landsat_dataset['pixel_qa'], water='water')
# cloud_mask = masking.make_mask(landsat_dataset['pixel_qa'], cloud='not_high_confidence', cloud_shadow='not_high_confidence')
# clean_mask = (clear_mask | water_mask) & cloud_mask

# good_data = landsat_dataset[data_names].where(clean_mask)
# # good_data.red.plot(col='time',col_wrap=6)
In [16]:
# from ceos_utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask
# cloud_mask_ceos = landsat_qa_clean_mask(landsat_dataset, platform="LANDSAT_8", collection="c2")
In [17]:
# cloud_mask = np.bitwise_and(landsat_dataset.pixel_qa,64).astype(bool)
# cleaned_dataset = landsat_dataset.drop('pixel_qa').where(cloud_mask)
# cleaned_dataset.red.plot(col='time',col_wrap=4)

Time Series Water Detection Analysis ▴¶

Time series output of the Australian Water Observations from Space (WOFS) results. The results show the percent of time that a pixel is classified as water over the entire time series. BLUE = frequent water, RED = infrequent water.

In [18]:
from wofs.virtualproduct import WOfSClassifier
In [19]:
# Rename some variables so that the GA algorithm works
landsat_dataset = landsat_dataset.rename_vars({
    "blue": "nbart_blue",
    "green": "nbart_green",
    "red": "nbart_red",
    "nir": "nbart_nir",
    "swir1": "nbart_swir_1",
    "swir2": "nbart_swir_2",
    "pixel_qa": "fmask",
})
landsat_dataset
Out[19]:
<xarray.Dataset>
Dimensions:       (time: 32, y: 800, x: 801)
Coordinates:
  * time          (time) datetime64[ns] 2021-01-13T18:15:45.783864 ... 2021-1...
  * y             (y) float64 4.041e+06 4.041e+06 ... 4.017e+06 4.017e+06
  * x             (x) float64 7.212e+05 7.212e+05 ... 7.452e+05 7.452e+05
    spatial_ref   int32 32611
Data variables:
    nbart_red     (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    nbart_green   (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    nbart_blue    (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    nbart_nir     (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    nbart_swir_1  (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    nbart_swir_2  (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
    fmask         (time, y, x) uint16 dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
Attributes:
    crs:           EPSG:32611
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 32
    • y: 800
    • x: 801
    • time
      (time)
      datetime64[ns]
      2021-01-13T18:15:45.783864 ... 2...
      units :
      seconds since 1970-01-01 00:00:00
      array(['2021-01-13T18:15:45.783864000', '2021-01-29T18:15:42.664125000',
             '2021-02-14T18:15:39.346007000', '2021-03-02T18:15:32.931929000',
             '2021-03-18T18:15:23.744977000', '2021-03-27T18:09:11.412465000',
             '2021-04-03T18:15:20.299577000', '2021-04-12T18:09:06.233874000',
             '2021-04-19T18:15:13.841622000', '2021-04-28T18:08:57.907926000',
             '2021-05-05T18:15:03.987064000', '2021-05-14T18:08:58.479044000',
             '2021-05-21T18:15:13.720180000', '2021-05-30T18:09:07.721192000',
             '2021-06-06T18:15:21.649829000', '2021-06-15T18:09:14.059217000',
             '2021-06-22T18:15:26.735389000', '2021-07-08T18:15:28.967279000',
             '2021-07-24T18:15:34.321139000', '2021-08-09T18:15:41.447233000',
             '2021-08-25T18:15:45.998194000', '2021-09-10T18:15:50.800755000',
             '2021-09-26T18:15:54.176376000', '2021-10-12T18:15:59.488427000',
             '2021-10-21T18:09:49.750483000', '2021-10-28T18:16:00.489171000',
             '2021-11-06T18:09:48.002874000', '2021-11-13T18:15:56.158648000',
             '2021-11-29T18:15:56.195335000', '2021-12-08T18:09:45.267956000',
             '2021-12-15T18:15:55.088125000', '2021-12-31T18:15:49.342773000'],
            dtype='datetime64[ns]')
    • y
      (y)
      float64
      4.041e+06 4.041e+06 ... 4.017e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:32611
      array([4041225., 4041195., 4041165., ..., 4017315., 4017285., 4017255.])
    • x
      (x)
      float64
      7.212e+05 7.212e+05 ... 7.452e+05
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:32611
      array([721215., 721245., 721275., ..., 745155., 745185., 745215.])
    • spatial_ref
      ()
      int32
      32611
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-117],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32611"]]
      grid_mapping_name :
      transverse_mercator
      array(32611, dtype=int32)
    • nbart_red
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • nbart_green
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • nbart_blue
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • nbart_nir
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • nbart_swir_1
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • nbart_swir_2
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      reflectance
      nodata :
      0
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • fmask
      (time, y, x)
      uint16
      dask.array<chunksize=(1, 800, 801), meta=np.ndarray>
      units :
      bit_index
      nodata :
      1
      flags_definition :
      {'snow': {'bits': 5, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'clear': {'bits': 6, 'values': {'0': 'not_clear', '1': 'clear'}}, 'cloud': {'bits': 3, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'water': {'bits': 7, 'values': {'0': 'land_or_cloud', '1': 'water'}}, 'cirrus': {'bits': 2, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'nodata': {'bits': 0, 'values': {'0': False, '1': True}}, 'qa_pixel': {'bits': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'values': {'1': 'Fill', '2': 'Dilated Cloud', '4': 'Cirrus', '8': 'Cloud', '16': 'Cloud Shadow', '32': 'Snow', '64': 'Clear', '128': 'Water', '256': 'Cloud Confidence low bit', '512': 'Cloud Confidence high bit', '1024': 'Cloud Shadow Confidence low bit', '2048': 'Cloud Shadow Confidence high bit', '4096': 'Snow Ice Confidence low bit', '8192': 'Snow Ice Confidence high bit', '16384': 'Cirrus Confidence low bit', '32768': 'Cirrus Confidence high bit'}, 'description': 'Level 2 pixel quality'}, 'cloud_shadow': {'bits': 4, 'values': {'0': 'not_high_confidence', '1': 'high_confidence'}}, 'dilated_cloud': {'bits': 1, 'values': {'0': 'not_dilated', '1': 'dilated'}}, 'cloud_confidence': {'bits': [8, 9], 'values': {'0': 'none', '1': 'low', '2': 'medium', '3': 'high'}}, 'cirrus_confidence': {'bits': [14, 15], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}, 'snow_ice_confidence': {'bits': [12, 13], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}, 'cloud_shadow_confidence': {'bits': [10, 11], 'values': {'0': 'none', '1': 'low', '2': 'reserved', '3': 'high'}}}
      crs :
      EPSG:32611
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 39.11 MiB 1.22 MiB
      Shape (32, 800, 801) (1, 800, 801)
      Dask graph 32 chunks in 1 graph layer
      Data type uint16 numpy.ndarray
      801 800 32
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2021-01-13 18:15:45.783864', '2021-01-29 18:15:42.664125',
                     '2021-02-14 18:15:39.346007', '2021-03-02 18:15:32.931929',
                     '2021-03-18 18:15:23.744977', '2021-03-27 18:09:11.412465',
                     '2021-04-03 18:15:20.299577', '2021-04-12 18:09:06.233874',
                     '2021-04-19 18:15:13.841622', '2021-04-28 18:08:57.907926',
                     '2021-05-05 18:15:03.987064', '2021-05-14 18:08:58.479044',
                     '2021-05-21 18:15:13.720180', '2021-05-30 18:09:07.721192',
                     '2021-06-06 18:15:21.649829', '2021-06-15 18:09:14.059217',
                     '2021-06-22 18:15:26.735389', '2021-07-08 18:15:28.967279',
                     '2021-07-24 18:15:34.321139', '2021-08-09 18:15:41.447233',
                     '2021-08-25 18:15:45.998194', '2021-09-10 18:15:50.800755',
                     '2021-09-26 18:15:54.176376', '2021-10-12 18:15:59.488427',
                     '2021-10-21 18:09:49.750483', '2021-10-28 18:16:00.489171',
                     '2021-11-06 18:09:48.002874', '2021-11-13 18:15:56.158648',
                     '2021-11-29 18:15:56.195335', '2021-12-08 18:09:45.267956',
                     '2021-12-15 18:15:55.088125', '2021-12-31 18:15:49.342773'],
                    dtype='datetime64[ns]', name='time', freq=None))
    • y
      PandasIndex
      PandasIndex(Float64Index([4041225.0, 4041195.0, 4041165.0, 4041135.0, 4041105.0, 4041075.0,
                    4041045.0, 4041015.0, 4040985.0, 4040955.0,
                    ...
                    4017525.0, 4017495.0, 4017465.0, 4017435.0, 4017405.0, 4017375.0,
                    4017345.0, 4017315.0, 4017285.0, 4017255.0],
                   dtype='float64', name='y', length=800))
    • x
      PandasIndex
      PandasIndex(Float64Index([721215.0, 721245.0, 721275.0, 721305.0, 721335.0, 721365.0,
                    721395.0, 721425.0, 721455.0, 721485.0,
                    ...
                    744945.0, 744975.0, 745005.0, 745035.0, 745065.0, 745095.0,
                    745125.0, 745155.0, 745185.0, 745215.0],
                   dtype='float64', name='x', length=801))
  • crs :
    EPSG:32611
    grid_mapping :
    spatial_ref
In [20]:
# Prepare the classifier
ts_water_classification = WOfSClassifier(c2_scaling=True,dsm_path=DEM_PATH,dsm_no_data=-32767)
In [21]:
# Run the classification. There might be some warnings about invalid values.
wofl = ts_water_classification.compute(landsat_dataset)
In [22]:
# Rename dimensions as required
wofl = wofl.rename({"x": "longitude", "y": "latitude"})
In [23]:
# Now categorise the data based on the classifier output
from odc.algo import safe_div, apply_numexpr, keep_good_only

wofl["bad"] = (wofl.water & 0b0111_1110) > 0
wofl["some"] = apply_numexpr("((water<<30)>>30)==0", wofl, name="some")
wofl["dry"] = wofl.water == 0
wofl["wet"] = wofl.water == 128
wofl = wofl.drop_vars("water")
for dv in wofl.data_vars.values():
    dv.attrs.pop("nodata", None)
In [24]:
# Run all the calculations and load into memory
wofl = wofl.compute()
/home/jovyan/venvs/wofs/lib/python3.10/site-packages/wofs/classifier.py:81: RuntimeWarning: divide by zero encountered in divide
  c = (a - b) / (a + b)
/home/jovyan/venvs/wofs/lib/python3.10/site-packages/wofs/classifier.py:81: RuntimeWarning: divide by zero encountered in divide
  c = (a - b) / (a + b)
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
/home/jovyan/venvs/wofs/lib/python3.10/site-packages/wofs/classifier.py:81: RuntimeWarning: divide by zero encountered in divide
  c = (a - b) / (a + b)
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
/home/jovyan/venvs/wofs/lib/python3.10/site-packages/wofs/classifier.py:81: RuntimeWarning: divide by zero encountered in divide
  c = (a - b) / (a + b)

IMPORTANT NOTE:

The images below show some prime examples of what happens when WOFS doesn't work properly. There are many images where there is an area of blue (water detected) around the edge of the lake, but the center of the lake is black (not water) or speckled with blue.

More commentary on this below.

In [25]:
# Have a look at the data
import matplotlib as mpl
from matplotlib.colors import LinearSegmentedColormap

water_cmap = LinearSegmentedColormap.from_list('water_class', ['#32373B', '#1789FC'], N=2)
fig = wofl.wet.plot(col="time", col_wrap=7, size=3, aspect=1,cmap=water_cmap)
for ax in fig.axs.flat:
    ax.xaxis.set_visible(False) # remove tile axes
    ax.yaxis.set_visible(False) # remove tile axes
    ax.set_title(ax.get_title().replace('time = ',''), fontsize=10)  # clean up tile titles
fig.cbar.ax.set_yticks(ticks=[0.25,0.75])
fig.cbar.ax.set_yticklabels(['0 - Not water', '1 - Water'],rotation='vertical',verticalalignment='center')
fig.cbar.ax.set_ylabel(None)
fig
Out[25]:
<xarray.plot.facetgrid.FacetGrid at 0x7f3e5ced81c0>
In [26]:
# Compare a couple of specific scenes to investigate further

date_1 = '2021-01-13'
date_2 = '2021-03-02'

fig = plt.figure(figsize=(14, 14))
ax1 = fig.add_subplot(2,2,1, aspect = "equal")
ax2 = fig.add_subplot(2,2,2, aspect = "equal")
ax3 = fig.add_subplot(2,2,3, aspect = "equal")
ax4 = fig.add_subplot(2,2,4, aspect = "equal")

true_1 = landsat_dataset[['nbart_red','nbart_green','nbart_blue']].sel(time=date_1,method='nearest').to_array().plot.imshow(ax=ax1,robust=True)
wet_1 = wofl.wet.sel(time=date_1,method='nearest').plot(ax=ax2,add_colorbar=False,cmap=water_cmap)
true_2 = landsat_dataset[['nbart_red','nbart_green','nbart_blue']].sel(time=date_2,method='nearest').to_array().plot.imshow(ax=ax3,robust=True)
wet_2 = wofl.wet.sel(time=date_2,method='nearest').plot(ax=ax4,add_colorbar=False,cmap=water_cmap)
ax1.set_title(f'True Color - good classification ({date_1})'), ax1.xaxis.set_visible(False), ax1.yaxis.set_visible(False)
ax2.set_title(f'Water classification - good classification ({date_1})'), ax2.xaxis.set_visible(False), ax2.yaxis.set_visible(False)
ax3.set_title(f'True Color - poor classification ({date_2})'), ax3.xaxis.set_visible(False), ax3.yaxis.set_visible(False)
ax4.set_title(f'Water classification - poor classification ({date_2})'), ax4.xaxis.set_visible(False), ax4.yaxis.set_visible(False)
# Forcing the colorbar to be added separately so that it doesn't change the figure sizes
ax_cbar1 = fig.add_axes([1, 0.4875, 0.02, 0.4625])
ax_cbar2 = fig.add_axes([1, 0, 0.02, 0.4625])
cbar1 = fig.colorbar(wet_1,cax=ax_cbar1,ticks=[0.25,0.75])
cbar1.ax.set_yticklabels(['0 - Not water', '1 - Water'],rotation='vertical',verticalalignment='center')
cbar2 = fig.colorbar(wet_2,cax=ax_cbar2,ticks=[0.25,0.75])
cbar2.ax.set_yticklabels(['0 - Not water', '1 - Water'],rotation='vertical',verticalalignment='center')

plt.subplots_adjust(left=0,bottom=0,right=0.95,top=0.95,wspace=0.05,hspace=0.05) # tight_layout() doesn't work when using add_axes()
plt.show()

More information:

As can be seen above, both selected dates appear similar and are nice clear images, with no cloud or atmospheric interference, yet in the second set of images, the classification fails in the middle of the lake. In this scene, the classification in the lake should be completely blue as with the first scene. This appears to happen in deep, clear (dark) water bodies like this lake and it is common in Chile as well.

There is a comment in the original WOFS paper regarding excessive noise over large lakes, but there may also be issues with parameterisation in clear/dark water bodies when there is insufficient contrast between some bands.

Copied from https://doi.org/10.1016/j.rse.2015.11.003:

A significant issue for large water bodies is signal noise for very clear water (Nichol and Vohora 2004). Data values in areas of very clear water are extremely low, often only 1 to 2 DN in the uncorrected Landsat data. This results in corresponding low values once the surface reflectance correction has been implemented, with additional issues from any error in the ancillary data used to produce the correction. As such it becomes possible for the noise to exceed the measurement by the Thematic Mapper sensor and hence the observed spectra to indicate that the target is not water. The observed values of NDI_43 and NDI_52 (see Table 2) can easily result in a water pixel in the centre of a lake being detected as not-water as the noise results in unusual values and the resulting index displays a strong positive value where it should physically be equally negative. Hence some issues arise in permanent water bodies (and ocean areas) occasionally being classified as not-water. This appears as speckle within large water bodies. A curious side effect of this behaviour is that shallow areas often display as having a higher water observation frequency than deep areas, apparently due to the improved signal to noise associated with the contribution of substrate reflectance. This is a subject for further investigation.

As a result, although this example results in an acceptable final image showing water frequencies of roughly 100% in the middle of the lake, this is primarily because the noise in many scenes results in a lower count of total observations rather than it being a "true" 100%

In [76]:
# Compare a couple of specific scenes to investigate further
# NEED TO USE THE MODIFIED WOFS CODE TO LOOK DEEPER

NDI_43 = (landsat_dataset.nbart_red - landsat_dataset.nbart_green) / (landsat_dataset.nbart_red + landsat_dataset.nbart_green)
NDI_52 = (landsat_dataset.nbart_nir - landsat_dataset.nbart_blue) / (landsat_dataset.nbart_nir + landsat_dataset.nbart_blue)
NDI_72 = (landsat_dataset.nbart_swir_2 - landsat_dataset.nbart_blue) / (landsat_dataset.nbart_swir_2 + landsat_dataset.nbart_blue)

fig, axs = plt.subplots(4, 5, figsize=(20,10), subplot_kw=dict(aspect="equal"))

# Force the colour ranges to show what we need
data_band_min = 7000
data_band_max = 9000

NDI_43_min = -0.1
NDI_43_max = 0.7

NDI_52_min = -0.2
NDI_52_max = 0.7

NDI_72_min = -0.3
NDI_72_max = 0.3

NDI_43_1 = NDI_43.sel(time=date_1,method='nearest').plot(ax=axs[0][0],cmap='Spectral',vmin=NDI_43_min,vmax=NDI_43_max)
NDI_52_1 = NDI_52.sel(time=date_1,method='nearest').plot(ax=axs[0][1],cmap='Spectral',vmin=NDI_52_min,vmax=NDI_52_max)
NDI_72_1 = NDI_72.sel(time=date_1,method='nearest').plot(ax=axs[0][2],cmap='Spectral',vmin=NDI_72_min,vmax=NDI_72_max)
BLUE_1 = landsat_dataset.nbart_blue.sel(time=date_1,method='nearest').plot(ax=axs[0][3],vmin=data_band_min,vmax=data_band_max)
GREEN_1 = landsat_dataset.nbart_blue.sel(time=date_1,method='nearest').plot(ax=axs[0][4],vmin=data_band_min,vmax=data_band_max)
RED_1 = landsat_dataset.nbart_blue.sel(time=date_1,method='nearest').plot(ax=axs[1][0],vmin=data_band_min,vmax=data_band_max)
NIR_1 = landsat_dataset.nbart_nir.sel(time=date_1,method='nearest').plot(ax=axs[1][1],vmin=data_band_min,vmax=data_band_max)
SWIR_1_1 = landsat_dataset.nbart_swir_1.sel(time=date_1,method='nearest').plot(ax=axs[1][2],vmin=data_band_min,vmax=data_band_max)
SWIR_2_1 = landsat_dataset.nbart_swir_2.sel(time=date_1,method='nearest').plot(ax=axs[1][3],vmin=data_band_min,vmax=data_band_max)

NDI_43_2 = NDI_43.sel(time=date_2,method='nearest').plot(ax=axs[2][0],cmap='Spectral',vmin=NDI_43_min,vmax=NDI_43_max)
NDI_52_2 = NDI_52.sel(time=date_2,method='nearest').plot(ax=axs[2][1],cmap='Spectral',vmin=NDI_52_min,vmax=NDI_52_max)
NDI_72_2 = NDI_72.sel(time=date_2,method='nearest').plot(ax=axs[2][2],cmap='Spectral',vmin=NDI_72_min,vmax=NDI_72_max)
BLUE_2 = landsat_dataset.nbart_blue.sel(time=date_2,method='nearest').plot(ax=axs[2][3],vmin=data_band_min,vmax=data_band_max)
GREEN_2 = landsat_dataset.nbart_blue.sel(time=date_2,method='nearest').plot(ax=axs[2][4],vmin=data_band_min,vmax=data_band_max)
RED_2 = landsat_dataset.nbart_blue.sel(time=date_2,method='nearest').plot(ax=axs[3][0],vmin=data_band_min,vmax=data_band_max)
NIR_2 = landsat_dataset.nbart_nir.sel(time=date_2,method='nearest').plot(ax=axs[3][1],vmin=data_band_min,vmax=data_band_max)
SWIR_1_2 = landsat_dataset.nbart_swir_1.sel(time=date_2,method='nearest').plot(ax=axs[3][2],vmin=data_band_min,vmax=data_band_max)
SWIR_2_2 = landsat_dataset.nbart_swir_2.sel(time=date_2,method='nearest').plot(ax=axs[3][3],vmin=data_band_min,vmax=data_band_max)

title_fontsize = 10
axs[0][0].set_title(f'NDI_43 - good ({date_1})',fontsize=title_fontsize)
axs[0][1].set_title(f'NDI_52 - good ({date_1})',fontsize=title_fontsize)
axs[0][2].set_title(f'NDI_72 - good ({date_1})',fontsize=title_fontsize)
axs[0][3].set_title(f'Blue - good ({date_1})',fontsize=title_fontsize)
axs[0][4].set_title(f'Green - good ({date_1})',fontsize=title_fontsize)
axs[1][0].set_title(f'Red - good ({date_1})',fontsize=title_fontsize)
axs[1][1].set_title(f'NIR - good ({date_1})',fontsize=title_fontsize)
axs[1][2].set_title(f'SWIR_1 - good ({date_1})',fontsize=title_fontsize)
axs[1][3].set_title(f'SWIR_2 - good ({date_1})',fontsize=title_fontsize)
axs[2][0].set_title(f'NDI_43 - poor ({date_2})',fontsize=title_fontsize)
axs[2][1].set_title(f'NDI_52 - poor ({date_2})',fontsize=title_fontsize)
axs[2][2].set_title(f'NDI_72 - poor ({date_2})',fontsize=title_fontsize)
axs[2][3].set_title(f'Blue - poor ({date_2})',fontsize=title_fontsize)
axs[2][4].set_title(f'Green - poor ({date_2})',fontsize=title_fontsize)
axs[3][0].set_title(f'Red - poor ({date_2})',fontsize=title_fontsize)
axs[3][1].set_title(f'NIR - poor ({date_1})',fontsize=title_fontsize)
axs[3][2].set_title(f'SWIR_1 - poor ({date_1})',fontsize=title_fontsize)
axs[3][3].set_title(f'SWIR_2 - poor ({date_1})',fontsize=title_fontsize)

for ax1 in axs:
    for ax in ax1:
        ax.xaxis.set_visible(False)
        ax.yaxis.set_visible(False)
plt.tight_layout()
plt.show()
In [64]:
# Helper frunction from https://github.com/opendatacube/odc-stats/blob/develop/odc/stats/plugins/wofs.py
def reduce(xx: xr.Dataset) -> xr.Dataset:
    nodata = -999
    count_some = xx.some.sum(axis=0, dtype="int16")
    count_wet = xx.wet.sum(axis=0, dtype="int16")
    count_dry = xx.dry.sum(axis=0, dtype="int16")
    count_clear = count_wet + count_dry
    frequency = safe_div(count_wet, count_clear, dtype="float32")

    count_wet.attrs["nodata"] = nodata
    count_clear.attrs["nodata"] = nodata

    is_ok = count_some > 0
    count_wet = keep_good_only(count_wet, is_ok)
    count_clear = keep_good_only(count_clear, is_ok)

    return xr.Dataset(
        dict(
            count_wet=count_wet,
            count_clear=count_clear,
            frequency=frequency,
        )
    )
In [65]:
summary = reduce(wofl)
summary
Out[65]:
<xarray.Dataset>
Dimensions:      (latitude: 800, longitude: 801)
Coordinates:
  * latitude     (latitude) float64 4.041e+06 4.041e+06 ... 4.017e+06 4.017e+06
  * longitude    (longitude) float64 7.212e+05 7.212e+05 ... 7.452e+05 7.452e+05
    spatial_ref  int32 32611
Data variables:
    count_wet    (latitude, longitude) int16 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
    count_clear  (latitude, longitude) int16 18 18 18 18 18 18 ... 0 0 0 0 20 0
    frequency    (latitude, longitude) float32 0.0 0.0 0.0 0.0 ... nan 0.0 nan
xarray.Dataset
    • latitude: 800
    • longitude: 801
    • latitude
      (latitude)
      float64
      4.041e+06 4.041e+06 ... 4.017e+06
      units :
      metre
      resolution :
      -30.0
      crs :
      EPSG:32611
      array([4041225., 4041195., 4041165., ..., 4017315., 4017285., 4017255.])
    • longitude
      (longitude)
      float64
      7.212e+05 7.212e+05 ... 7.452e+05
      units :
      metre
      resolution :
      30.0
      crs :
      EPSG:32611
      array([721215., 721245., 721275., ..., 745155., 745185., 745215.])
    • spatial_ref
      ()
      int32
      32611
      spatial_ref :
      PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-117],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32611"]]
      grid_mapping_name :
      transverse_mercator
      array(32611, dtype=int32)
    • count_wet
      (latitude, longitude)
      int16
      0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0
      nodata :
      -999
      array([[0, 0, 0, ..., 0, 0, 0],
             [0, 0, 0, ..., 0, 0, 0],
             [0, 0, 0, ..., 0, 0, 0],
             ...,
             [0, 0, 0, ..., 0, 0, 0],
             [0, 0, 0, ..., 0, 0, 0],
             [0, 0, 0, ..., 0, 0, 0]], dtype=int16)
    • count_clear
      (latitude, longitude)
      int16
      18 18 18 18 18 18 ... 0 0 0 0 20 0
      nodata :
      -999
      array([[18, 18, 18, ..., 17, 17, 17],
             [18, 18, 18, ..., 17, 17, 17],
             [18, 18, 18, ..., 17, 17, 17],
             ...,
             [17, 17, 17, ...,  0,  0,  0],
             [17, 17, 17, ...,  0,  0, 22],
             [17, 17, 17, ...,  0, 20,  0]], dtype=int16)
    • frequency
      (latitude, longitude)
      float32
      0.0 0.0 0.0 0.0 ... nan nan 0.0 nan
      array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
             [ 0.,  0.,  0., ...,  0.,  0.,  0.],
             [ 0.,  0.,  0., ...,  0.,  0.,  0.],
             ...,
             [ 0.,  0.,  0., ..., nan, nan, nan],
             [ 0.,  0.,  0., ..., nan, nan,  0.],
             [ 0.,  0.,  0., ..., nan,  0., nan]], dtype=float32)
    • latitude
      PandasIndex
      PandasIndex(Float64Index([4041225.0, 4041195.0, 4041165.0, 4041135.0, 4041105.0, 4041075.0,
                    4041045.0, 4041015.0, 4040985.0, 4040955.0,
                    ...
                    4017525.0, 4017495.0, 4017465.0, 4017435.0, 4017405.0, 4017375.0,
                    4017345.0, 4017315.0, 4017285.0, 4017255.0],
                   dtype='float64', name='latitude', length=800))
    • longitude
      PandasIndex
      PandasIndex(Float64Index([721215.0, 721245.0, 721275.0, 721305.0, 721335.0, 721365.0,
                    721395.0, 721425.0, 721455.0, 721485.0,
                    ...
                    744945.0, 744975.0, 745005.0, 745035.0, 745065.0, 745095.0,
                    745125.0, 745155.0, 745185.0, 745215.0],
                   dtype='float64', name='longitude', length=801))
In [66]:
from matplotlib.cm import jet_r
jet_r.set_bad('black',1)
In [67]:
# Plot of wet counts
# Areas of RED have experienced little or no water over the time series
# Areas of BLUE have experience significant or constant water over the time series
summary.count_wet.plot(size=10,cmap = jet_r);
plt.title("Count of Samples Classified as Water")
plt.axis('off')
plt.show()

More information:

The plot below helps to highlight the issue. The number of clear pixels is dramatically different over the water than over the land. This lake doesn't get significant lake fog or cloud, so it should be expected that the number of clear days over the water are similar to over the land areas.

In [68]:
# Plot of clear counts
summary.count_clear.plot(size=10,cmap = jet_r);
plt.title("Count of Samples Classified as Water")
plt.axis('off')
plt.show()

More information:

As a result of the lower wet and clear counts, the final percent map seems to be good, but it is hiding the reality that there are problems with the processing.

In [69]:
# Plot of wet frequency
(summary.frequency*100).plot(cmap = jet_r, size=10)
plt.title("Percent of Samples Classified as Water")
plt.axis('off')
plt.show()
In [ ]: